Supporting Partial Data Replication in Distributed
نویسندگان
چکیده
Transactional memory (TM) [8] is consistently making its way into mainstream programming, being already deployed by some of the major CPU manufacturers [11] and in several reference compilers [5]. To cope with requirements such as scalability and dependability, recent proposals explore the combination of TM with data replication, bringing TM to distributed environments — conceiving distributed transactional memory (DTM). However, current DTM frameworks support only full data replication [2, 10]. They provide the best possible level of tolerance to data loss, but limit the system’s total storage capacity to the capacity of the node with fewer resources, and require coordination among all the system’s nodes, an approach bound to hamper scalability in large scale systems. In this context, a partial data replication [1] strategy can help to lessen these shortcomings. Each node replicates only a subset of the system’s dataset, an approach that aims at combining the best of data distribution and full replication, while trying to attenuate their disadvantages. The key idea is to allow the dataset to be distributed among the participating nodes and to decrease the number of nodes that have to participate in a transaction’s confirmation, as any given transaction only has to be confirmed by the nodes that replicate the data items in its read and write sets. By distributing the data and reducing the coordination cost among nodes, partial data replication leverages the system’s scalability. Although this strategy has already been explored by the distributed databases research field [6], it is yet to be addressed in the context of (D)TM. More specifically, partial data replication has been broadly applied in key-value stores [7], and even though these work on in-memory data and support transactions, they present significant di↵erences when compared with DTM systems for general purpose programming languages. To this extent, we propose PARdstm, to the best of our knowledge, the first DTM framework to include support for partial data replication. As such, the contributions of this work are: a reasoning on how partial data replication shall be supported in general purpose programming languages (Java, in particular), and a modular software framework that embeds such principles to provide a highly expressive and non-intrusive programming API. Initial experimental results give evidence that our approach may enhance scalability in large scale systems, when compared to full data replication. An ongoing comprehensive study will allow us to assess in which contexts of use (workloads, number of nodes, etc.) partial data replication may be an e↵ective alternative.
منابع مشابه
Partial Replication for Software Transactional Memory Systems
Nowadays, transactional in-memory distributed storage systems are widely used as a mean to increase the performance of applications that need to access frequently large amount of shared data. In this context, data replication has two main advantages: it supports load balancing and fault-tolerance. However, these advantages need to be weighted against the costs of replications: namely memory con...
متن کاملPartial Replication on Transactional Memory Systems
Nowadays, transactional in-memory distributed storage systems are widely used as a mean to increase the performance of applications that need to access frequently large amount of shared data. In this context, data replication has two main advantages: it supports load balancing and fault-tolerance. However, these advantages need to be weighted against the costs of replications: namely memory con...
متن کاملAchieving Causal Consistency under Partial Replication for Geo-distributed Cloud Storage
Causal consistency has emerged as an attractive middle-ground to architecting cloud storage systems, as it allows for high availability and low latency, while supporting stronger-than-eventual-consistency semantics. However, causally-consistent cloud storage systems have seen limited deployment in practice. A key factor is these systems employ full replication of all the data in all the data ce...
متن کاملE2DR: Energy Efficient Data Replication in Data Grid
Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...
متن کاملPRACTI Replication for Large-Scale Systems
Many replication mechanisms for large scale distributed systems exist, but they require a designer to compromise a system’s replication policy (e.g., by requiring full replication of all data to all nodes), consistency policy (e.g., by supporting per-object coherence but not multiobject consistency), or topology policy (e.g., by assuming a hierarchical organization of nodes.) In this paper, we ...
متن کامل